Prediction of human mRNA donor and acceptor sites from the DNA sequence.
نویسندگان
چکیده
Artificial neural networks have been applied to the prediction of splice site location in human pre-mRNA. A joint prediction scheme where prediction of transition regions between introns and exons regulates a cutoff level for splice site assignment was able to predict splice site locations with confidence levels far better than previously reported in the literature. The problem of predicting donor and acceptor sites in human genes is hampered by the presence of numerous amounts of false positives: here, the distribution of these false splice sites is examined and linked to a possible scenario for the splicing mechanism in vivo. When the presented method detects 95% of the true donor and acceptor sites, it makes less than 0.1% false donor site assignments and less than 0.4% false acceptor site assignments. For the large data set used in this study, this means that on average there are one and a half false donor sites per true donor site and six false acceptor sites per true acceptor site. With the joint assignment method, more than a fifth of the true donor sites and around one fourth of the true acceptor sites could be detected without accompaniment of any false positive predictions. Highly confident splice sites could not be isolated with a widely used weight matrix method or by separate splice site networks. A complementary relation between the confidence levels of the coding/non-coding and the separate splice site networks was observed, with many weak splice sites having sharp transitions in the coding/non-coding signal and many stronger splice sites having more ill-defined transitions between coding and non-coding.
منابع مشابه
Overlapping Alternative donor splice Sites in the Human genome
Over 50% of donor splice sites in the human genome have a potential alternative donor site at a distance of three to six nucleotides. Conservation of these potential sites is determined by the consensus requirements and by its exonic or intronic location. Several hundred pairs of overlapping sites are confirmed to be alternatively spliced as both sites in a pair are supported by a protein, by a...
متن کاملIntegrated Model of DNA Sequence Numerical Representation and Artificial Neural Network for Human Donor and Acceptor Sites Prediction
Human Genome Project has led to a huge inflow of genomic data. After the completion of human genome sequencing, more and more effort is being put into identification of splicing sites of exons and introns (donor and acceptor sites). These invite bioinformatics to analysis the genome sequences and identify the location of exon and intron boundaries or in other words prediction of splicing sites....
متن کاملHidden Markov Model for Splicing Junction Sites Identification in DNA Sequences
Identification of coding sequence from genomic DNA sequence is the major step in pursuit of gene identification. In the eukaryotic organism, gene structure consists of promoter, intron, start codon, exons and stop codon, etc. and to identify it, accurate labeling of the mentioned segments is necessary. Splice site is the ‘separation’ between exons and introns, the predicted accuracy of which is...
متن کاملThe human XPG gene: gene architecture, alternative splicing and single nucleotide polymorphisms.
Defects in the XPG DNA repair endonuclease gene can result in the cancer-prone disorders xeroderma pigmentosum (XP) or the XP-Cockayne syndrome complex. While the XPG cDNA sequence was known, determination of the genomic sequence was required to understand its different functions. In cells from normal donors, we found that the genomic sequence of the human XPG gene spans 30 kb, contains 15 exon...
متن کاملMutations that alter RNA splicing of the human HPRT gene: a review of the spectrum.
The human HPRT gene contains spans approximately 42,000 base pairs in genomic DNA, has a mRNA of approximately 900 bases and a protein coding sequence of 657 bases (initiation codon AUG to termination codon UAA). This coding sequence is distributed into 9 exons ranging from 18 (exon 5) to 184 (exon 3) base pairs. Intron sizes range from 170 (intron 7) to 13,075 (intron 1) base pairs. In a datab...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Journal of molecular biology
دوره 220 1 شماره
صفحات -
تاریخ انتشار 1991